首页> 外文OA文献 >Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes
【2h】

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

机译:使用策略语言偏差进行近似策略迭代:求解   关系马尔可夫决策过程

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We study an approach to policy selection for large relational Markov DecisionProcesses (MDPs). We consider a variant of approximate policy iteration (API)that replaces the usual value-function learning step with a learning step inpolicy space. This is advantageous in domains where good policies are easier torepresent and learn than the corresponding value functions, which is often thecase for the relational MDPs we are interested in. In order to apply API tosuch problems, we introduce a relational policy language and correspondinglearner. In addition, we introduce a new bootstrapping routine for goal-basedplanning domains, based on random walks. Such bootstrapping is necessary formany large relational MDPs, where reward is extremely sparse, as API isineffective in such domains when initialized with an uninformed policy. Ourexperiments show that the resulting system is able to find good policies for anumber of classical planning domains and their stochastic variants by solvingthem as extremely large relational MDPs. The experiments also point to somelimitations of our approach, suggesting future work.
机译:我们研究了一种用于大型关系马尔可夫决策过程(MDP)的策略选择方法。我们考虑一种近似策略迭代(API)的变体,该变体将常规的价值函数学习步骤替换为学习步骤策略空间。这在良好策略比相应的值函数更易于表示和学习的领域中是有利的,这对于我们感兴趣的关系MDP通常是这样。为了将API应用到此类问题,我们引入了关系策略语言和相应的学习器。此外,我们基于随机游走引入了一种新的自举程序,用于基于目标的计划域。这种引导对于任何大型的关系型MDP都是必要的,在这种关系型MDP中,奖励极其稀疏,因为在使用不明智的策略进行初始化时,API在此类域中无效。我们的实验表明,通过将解决方案解决为极大的关系型MDP,所得的系统能够为许多经典计划领域及其随机变量找到良好的策略。实验还指出了我们方法的局限性,暗示了未来的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号